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ABSTRACT 
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differences on writing performance and psychological variables. Also, 
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Abstract 



This study examines the effects of peer evaluation on writing 
performance and attitudes of 9th grade students. 
Research on peer evaluation and a extensive (quasi-) 
experimentation are described. The results of the 
experimentation are fairly unambiguous. The difference 
between peer feedback and teacher feedback produced no 
differences on Writing Performance and Psychological 
Variables. It also emerged that sex and proficiency level 
show little or no effect in relation to type of feedback. In 
the discussion an attempt is made to explain the results. 
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1. INTRODUCTION 



There has long been an impression that school students can 
learn a lot from each other. Not a few teachers and 
researchers have investigated whether this is in fact the 
case. During the 70s in particular, a new view on learning 
(interactive instead of monological) and writing 
(communicative instead of modeling) has led to numerous 
studies on the effects of peer evaluation. The old idea that 
skills are developed by practice and the fact that mother 
tongue teachers always end up with vast piles of correction 
work have been fertile ground for experimentation with peer 
evaluation. In this article we report on a field experiment 
in the Netherlands (Ri jlaarsdam, 1986). The results 
correspond to what has been found in Americen research. In 
the discussion we will look more closely at these results as 
well as at the notion that emerges from the literature, that 
viz. peer evaluation works. 

Why should peer evaluation have a positive effect on the 
development of written composition skills? There is no theory 
to explain this, even through much empirical research has 
been done (see xxx) and advocates of peer evaluation as a 
didactic measure in the teaching of written composition such 
as Bruffee (1980), Elbow (1974) and Moffett (1968) base their 
arguments on suppositions of varying scientific validity. In 
fact, all there is is a common sense theory. 

In the case or reading skills, however, some attempts have 
been made to formulate a scientific explanation for the 
results of peer teaching on reading performance (Bloom, 
1976). In her monograph Bloom draws on the principles of 
Dollard and Miller (1950) later to become so important in 
mastery learning: cues, participation and reinforcement. 
Because of the one-to-one character of the interaction and 
the special relation between students, she said, both 
students who were giving tuition and students who were 
receiving tuition are learning much from the peer setting. 
This explanation is plausible but it cannot be transferred to 



teaching written composition, which is, after all, already 
highly individualized; especially in regard to feedback. 

Others (Sarbin, 1976; Sarbin & Allen, 1968) have tried to 
explain the effects of peer teaching from the angle of role 
taking theory. They postulate that the nature of the relation 
between students is different from that between students and 
teachers, and that because of this they reward one another 
differently (more effectively). Taking the role of teacher 
puts students in a position in which they can experience 
feelings and experiences that go with such a role: prestige, 
authority, competency. This can lead to a positive 
self-conception (Bandura, 1982; Weiner, 1974). 

However, none of the theories, whether that of Bloom or 
that of Sarbin 6c Allen, can easily be applied to the teaching 
of written composition. The teaching of written composition 
is already quite highly individualized, and the role of 
positive motivation in a cognitive skill like writing is 
debatable. Moreover, the teaching of writing distinguishes 
itself from other domains of instruction because writing is a 
communicative act. However, we have already observed, many 
teachers did not allow the absence of a scientific didactic 
theory explaining these effects of peer evaluation in the 
teaching of writing to prevent them from applying the 
principle in practice. We v/ill analyze publications on peer 
evaluation and written composition ability for causal 
assertions and from these have constructed d common sense 
theory. Then we will briefly examine the empirical data 
available to us. 

2. COMMON SENSE THEORY 

We have made an inventory of causal assertions about elements 
of peer evaluation and written composition ability in a 
variety of ways. We have interviewed eleven teachers using 
peer evaluation and their students (Triesochei jn, Bochardt & 
Ri jlaarsdam, 1984) . We have analyzed learner reports of 
students who were confronted intensively with peer evaluation 
in the written composition course they received (Ri jlaarsdam, 



1985). By means of a computer search (see appendix 1) we have 
analyzed articles by teachers and mother tongue educationists 
(Rijlaarsdam, 1984). The following is a highly condensed 
summary of our findings. 

The process of peer evaluation, applied to the teaching of 
written composition, can be roughly divided into four stages: 
1. writing; 2. reading; 3, commenting; 4. receiving comments. 
These four stages make up two complementary couples. In the 
first two stages, the communicative couple of writing and 
reading, the communicative act is dominant: there is very 
little intentional learning (see Rijlaarsdam & Hulshof, 1984, 
p. 190). In the second, instructive, couple, two students 
find themselves in a communicative relation towards each 
other, but now they are in the roles of instruction giver and 
instruction receiver. An instructive message is sent: the 
response to or comment on the essay. At any given moment a 
student plays at least two roles in this teaching/learning 
process. When he is writing, he is also the audience for 
another student. When he is reading, he reads as a reader but 
also as a student; he is reading partly in order to learn 
from his reading. When he is commenting, he is also the 
addressee for another student. When he is receiving comments, 
he is a commentator for another student. These relations are 
illustrated schematically in fig. 1. 



Figure 1: A diagram of student response 
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Writing. Compared to teacher feedback, the task situation of 
the writer is characterized by three elements. (1) First and 
foremost, the essay is meant to be read; the stress is not on 
gaining marks or grades. Thus the texts will function 
communicatively rather than as school exercises. And because 
content now is more important than formal aspects, it might 
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well be the case that the writers will experience the task 
situation as less threatening. (2) There will be readers. To 
avoid being misunderstood, the writer will have to pay more 
attention to careful formulation and editing. (3) Readers are 
peers, not teachers, so that it is easier for writers to pass 
on new contents to readers. 

Reading. By reading one another's texts students experience 
the natural reactions of readers: personal preferences, 
points of view, and prior knowledge all prove to play a part. 
This knowledge, as well as knowledge gained through a natural 
form of modeling (text models, vocabulary etc.) will might 
play a part in the next writing task. Students experience the 
dynamics of communication and knowledge: writing tasks can be 
tackled in very different ways, all rhetorically effective. 
Reading large numbers of texts offers plenty of reading 
experience so that students become more perceptive, first of 
one another's and then of their own texts. This will lead to 
more intensive and more careful correction and rewriting. 

Coiuaenting. In fact, commenting is a very realistic writing 
task, with a real-life audience. The implicit or explicit 
criteria acquired through commenting on essays will play a 
part in the writing of texts and in understanding the 
comments made by others. Students learn to consider texts as 
coaches on the sideline of the communication between writer 
and reader. It might well be the case that this distancing 
oneself transfers to one's own writing process. 

The commenting process of students has recently been the 
subject of study. The studies concerned show that some 
progression can be detected in the aspects students pay 
attention to (Hilgers, 1984) and that there are clear signs 
of the effect of teaching on the nature of the comments given 
(Hilgers, 1984; Ziv, 1983; Rubin, 1983). What causes problems 
for students is the conflict of roles between coach and 
communicator (Newkirk, 1984a; 1984b). They tend to allow 
themselves to be distracted by the subject of the essay and 
to read it in a 'filling-in' way as a result of which they 
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are less likely to notice structural shortcomings in the 
text* Teachers, by contrast, make the text do the work. Then 
again, students prove to be less flexible than teachers when 
it comes to applying the models they have learned: a very 
individual text was greatly appreciated by teachers, while 
students rejected the same text because it did not conform to 
the models they had learned at school. 

Receiving conments. The feedback situation differs from 
teacher feedback in three respects: the number of feedback 
givers, the speed of the feedback, and the person of the 
feedback giver. The number of feedback messages means that 
the students are less dependent on a single judgement and 
that they have to manifest more responsibility towards 
themselves in selecting from and accepting feedback. As a 
rule, students expect more from fast feedback than from 
delayed feedback • At the same time, the fact that the 
feedback is given by the intended readers means that the 
receivers regard the feedback as valid* It is accordingly 
more likely to be taken to heart. It also becomes apparent to 
the students that clarity and grammaticality are not merely 
the professional interest of the teacher but that these are 
also of communicative importance if one is to be properly 
understood. 

The process of assimilating comments has been investigated 
in a number of studies. Jones (1977) observes that in 50% of 
cases students rightly reject comments. All sorts of factors 
play a part in this process. Rubin (1983) concludes that 
students need time to convert the acquired critical skills 
into skill at textual revision. Davis (1982) demonstrates 
that giving oral comments does not work out well. Stone 
(1981), Ziv (1983) and Jones (1977) examined the respects in 
which the comments of peers were accepted: comments regarding 
content were least likely to be accepted by writers. The same 
result was obtained by a process study in which students 
revised their texts while thinking aloud (Bochardt 6c 
Rijlaarsdam/ 1984). Differences in accepting and rejecting 
comments may also be due to differences between writers. Good 
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writers do not confine themselves to superficial features 
(Stone, 1981). Berkenkotter (1984) shows that personalities 
of writers (autonomy) have an effect on how they deal with 
criticism from other students. 



3. EMPIRICAL EVIDENCE 

Table 2 summarizes the results of 21 experimental or 
quasi-experimental studies with peer evaluation as an 
independent variable and writing ability and/or writing 
attitudes as dependent variable(s). 

Table 2: The variables that were related in 21 effect studies 
to writing ability and attitudes to writing or 
writing apprehension; in parentheses, the number of 
times the relation was investigated. The numbers 
refer to the studies listed in appendix 2. 

written composition skil\ Attitudes/Writing apprehension 

Variables not no signifi- signifi- no signitP signif i- 

tested cant effect cant effect ficant effect cant effect 

Didactic parameters 

1. Teaching program (21) 2 3,4,6,7,9, 1,5,8,10,12 (10) l,3,7d,9,14, 7d, 11,21 

11,14,16,17,18 13,15,19 15,17,20 
20,21 

2. Teacher (8) 4,5,6,11,14,18 1,11a, 20 (4) ll,14c,20e 1,14c, 11a 

20e 

3. Class (2) 6,17 17 
S^rudent parameters 

4. Sex (3) 4,19b 1,19b (2) 20 1 

5. Intro/Extrovcrsion (2) 14,17 

6. Writing Apprehension 11 

Interactions 

7. prog. X teacher (5) 5,11,18,20 1 (3) 1,11 20 
9. prog. X sex pupil (4) 11 1,9,16,19 (2) 1,9 
9. teach. ^ sex pupil (2) 1,4 (2) 1 20 

a: Fox (1979) : teacher effect does not occur In analyses of whole group; ioes occur in 

analyses of subgroep (high/low writing apprehension), 
b: Sager (1973) : sex effect on three subvariables, not on two others. 

c: Lyons (1976) : teacher effect occurs for 1 subvariable of attitudes; noi. on three other 
subvariables. 

d: Oclaney (1980) : significant effect on some attitude variables, not on others, 
e: Scars (1971) : teacher effect occurs for effort expectations, not for estimate of own 
ability. 
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Does the use of students as commentators on essays have a 
greater effect on writing ability than have the teacher 
comment on them? From table 2 it will be read that 
statistically significant differences could be shown between 
the scores of the different groups only in eight of the 21 
studies. Of those studies, seven gave a positive result in 
favor of the experimental program, and one, Earls (1983) 
found the control program to be superior. The picture is no 
better when it comes to attitudes towards writing or writing 
apprehension: a significant difference in favor of the 
experimental program was detected in three studies out of 
ten. 

Of the variables in the Teaching Features category the 
teacher variable is the most important. Several studies were 
set up in such a way that one teacher taught two classes: an 
experimental and a control class. In the studies by Benson 
(1979), Fox (1979), Lyons (1976) and Sears (1971) the teacher 
variable is found to cause a statistically significant 
effect. In Benson's study (1979) the effect is spurious, 
since the teacher variable coincides with the years and age 
of students, with differences between schools. In Fox (1979) 
the effect of the teacher variable is apparent only when the 
data on a subgroup, viz. those with high writing 
apprehension, are analyzed. IL is not inconceivable that this 
group of students is particularly sensitive to the person of 
the teacher. In Lyons (1976) the teacher effect is 
demonstrated only from one attitude concept, 'showing my 
writing to teachers', which is understandably a 
teacher-dependent concept. In research that Sears (1971) 
conducted, she demonstrated a teacher effect on both writing 
performance and an attitude concept: estimating effort. No 
teacher effect was found on another attitude concept 
estimating one's own ability. In short, when it comes to 
attitudes, a teacher effect is demonstrated in three studies, 
with the same studies showing no teacher effect on other 
concepts of attitude. Some attitudes, then, do appear to be 
affected by the teacher. In the case of writing ability a 



teacher effect is not likely to occur except in a sensitive 
group of students (Fox, 1978) 

The student parameters that were investigated and tested 
were sex, degree of introversion and extroversion, and 
writing apprehension. Benson (1979) found that girls 
performed better than boys; Carter (1982) found a similar 
tendency, though it was not statistically significant. Sager 
(1973) found a significant sex effect for three highly 
correlated variables (supporting information, sentence 
structure and overall assessment), but not for two other 
variables (vocabulary, organization) . Benson (1979) found a 
sex effect for essay quality, attitude, length of essays and 
revision at the paragraph level, all in favor of girls. When 
a sex effect is fourxd, it is always a matter of girls 
performing better than boys. 

Fox (1978) showed that students with high writing 
apprehension performed significantly better after Fox's 
experimental teaching procedure than those students with high 
writing apprehension who were taught in the control 
condition. We have already seen that it looks as if this 
group of students is more sensitive to the personality of the 
teacher: this also appears to hold for the teaching that is 
given. 

Of the investigated interactions those between program and 
other variables is interesting. Now that it turns out that 
the interaction between teacher and program is significant in 
only two out of eight cases, we may expect to find that 
teachers do not produce systematically better performance in 
either of the programs they use. A conspicuous feature is the 
relatively high incidence of a significant interaction 
demonstrated between program and sex of students, namely five 
times out of six. This might be taken to indicate that the 
teaching programs are sex-specific. Benson (1979) found that 
girls benefited from structured, informative student 
feedback, whereas boys fared better in the teacher feedback 
condition. Myers (1979) found that there was no difference 
between boys and girls in the control condition but that 
girls did significantly better in the experimental condition. 
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Farrell (1977) reports results that contradict this. Sager 
(1978) reports a leveling effect of the experimental teaching 
program: the differences between boys and girls became 
smaller in the experimental condition. The boys benefited 
from student response. Although we must be aware that such 
interaction can occur, there is too little to qo on to 
cherish particular hopes about such interaction. 

A conclusion of quite a different sort is the following 
one. It can be deduced from table 2 that the reporting in the 
consulted studies is defective. Sometimes results are 
reported for the main effect of teacher, but there is nothing 
about any interaction between teacher and program (4, 6, 14, 
see table 2). In an investigation in which sex is included as 
one of the variables, there is no reference to an interaction 
between sex and program (e.g. 4). Only one inv^istigator (Fox, 
1979) has taken advantage of the opportunity to analyze data 
for subgroups. Otherwise we should have had more information 
about the effect of the level of writing ability: is student 
feedback an efficient teaching procedure mainly for good or 
mainly for bad students? 

Since some studies distinguish several different aspects of 
written composition ability, in a second analysis of these 
studies we examined the independent variables within writing 
ability. It might be possible to track some promising 
subvariables. The data in table 3 show that differences were 
expected principally in formal qualities. In ten studies, 
differences in quality were looked for within the categories 
of Spelling and Punctuation and Formulation and Style, in 
three of them successfully. Differences in Organization and 
Content were expected much less often. Significant 
differences were found conspicuously often in none of the 
variables named in table 3. 

No particularly promising dependent variables emerge from 
table 3. It is striking that the investigators stressed 
formal aspects of usage rather than rhetorical aspects, 
whereas the impression in educational literature is that it 
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is precisely these rhetorical a?)pects that ar-e consit'ared to 
be more, or at least equally, important. 

Table 3: Specific Writing Performance Variables in effect 
studies. The numbers refer to the studies in 
appendix 2. 



Category Subcategory 
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= significant {p ^.05) 



An analysis of the study reports revealed at least three 
problems. First, in only three of the studies (Benson, 1979; 
Myers, 1979; Ward, 1959) the statistical power was 
sufficiently high, i.e. greater than 80%. Second, in all the 
quasi-experimental studies the data proved to have been 
analyzed at the level of individual scores, though it was not 
individuals but classes that had been randomly assigned to 
the condition. This means that the investigators were testing 
against far too many degrees of freedom, because the number 
of degrees of freedom was not corrected for intraclass 
correlations. Third, in two-thirds of the studies peer 
evaluation turned out not to be the only variable which the 
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two conditions differed, so that interpretation of the 
results was difficult and complicated. 

4. HYPOTHESES 

In our study we examined effects on attitudes, on performance 
variables and writing process variables. This last cluster 
will be disregarded here (see Rijlaarsdam, 1986; Baltzer, 
1986; Rijlaarsdam, Baltzer & Schoonen, 1987). 

Using students as commentators on each other's texts will 
lead to texts showing more signs of the awareness that 
writing is communicating: the texts will then become goal and 
audience oriented. They will not reflect the content as it is 
stored in the memory of the writer, but an adaptation of it 
(cf. Flower's (1979) concepts of 'writer based prose' and 
'reader based prose'). Because of the communicative situation 
and the acquired models, essays will improve in style and 
organization. We also expect students to gain more confidence 
in their own ability from student response. They will enjoy 
in writing more. Students will find having their essays read 
or assessed less threatening. Their attitude towards being 
assessed, implied by all written communicative acts, will be 
more positive. 

We will also consider whether it is true that v-oak students 
gain more from peer feedback and good students from teacher 
feedback. Good students are hypothesized to learn less from 
texts that are below their level, whereas by contrast weak 
students can profit from 'the zone of closest development' 
(Hoover, 1972). We shall also look at possible interactions 
between other student parameters and the teaching programs. 
Girls, for example, seem to have a tendency to ascribe their 
under achievement to a lack of ability when the feedback is 
provided by the teacher. With boys this attribution process 
occurs when they receive negative feedback from their peers 
(Dweck et al., 1978). Thus for underachieving girls teacher 
feedback, and for underachieving boys peer feedback, may be 
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disastrous. This pattern may be further intensified by a high 
degree of writing apprehension* 

5. METHOD AND PROCEDURES 

The purpose of the investigation was to test an experimental 
method of teaching written composition* The experimental 
design used included a pretest, a midtest and a posttest, in 
which classes were blocked on the teacher variable. We were 
able to use eight schools and a total of eleven teachers. 
Each teacher taught two ninth-grade classes which were 
assigned arbitrarily to one or another of the conditions. In 
the experimental condition the students taught each other by 
commenting on each other's essays in writing. In the control 
condition the commenting task was reserved for the teacher. 
Dependent variables were Writing Performance Variables (Goal 
orientation, Audience orientation, Organization and Style) 
and Psychological Variables (Fear-of-not-being-able-to-write, 
Attitude-towards-writing, Attitude-towards-being-evaluated) . 
The comparison was carried out with the class as the analysis 
level. So that the effect analyses would allow for possible 
initial differences between classes, covariance analyses were 
used as far as possible, with pretest scores as covariate. 
Descriptive statistics were used to determine the quality of 
the instruments and to describe test performances. Relations 
"between Psychological Variables and Writing Performance 
Variables were described using Pearson's product-moment 
correlations . 

5.1. PROCEDURES 

The survey was conducted in the third classes (ninth grade) 
of the VWO and HAVO departments (the highest and second 
highest type of Dutch secondary school) at eight different 
schools. Eleven teachers each taught two classes. Their age 
varied from 27 to 48 (mean 37), their teaching experience 
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from four to 25 years (mean 15) • At three moments all 
students spent 90 minutes writing an essay, especially 
written for this study only* The task was the same for all 
students* It had to be a discursive essay making use of 
provided documentation. The midtest was identical to the 
pretest and was given after three teaching blocks (see 
teaching programs), or five months. The posttest was not 
identical but similar to the pretest and midtest. The 
posttest followed three months after the midtest. Between the 
midtest and the posttest a single teaching block was given. 
At all test moments an attitude inventory was taken also. 

5.2. INSTRUMENTATION 

Essay scales were constructed for each performance variable 
(Goal Orientation, Audience Orientation, Organization and 
Style). These scales are a series of essays gradually 
increasing in quality which are a useful aid to raters 
because differences in quality and the associated textual 
features are clearly shown. Between two and five evaluative 
questions were asked for each variable, each question being 
accompanied by essay features that v/ould be found in the 
essay to be evaluated in the case of a positive or negative 
answer, as the case might be. The following evaluative 
questions were asked: 

Audience orientation: The relation between writer and 
reader: does the writer make contact with the reader? The 
relation between subject and reader: does the text contain 
content elements from which it is apparent that the reader 
has some first-hand knowledge of the subject? 

Goal orientation: Does the text contain a clear 
standpoint? Does the text contain content elements that 
increase/reduce the cogency of the argument? 

Organization: Is the text well arranged visually? Is the 
essay well divided into beginning, middle and end? Is the 
principal theme formulated in the introduction and end? Are 
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the paragraphs properly linked? Are they properly structured? 

Style: Is there variation in sentence structure and 
vocabulary? Does the essay contain linguistic and stylistic 
devices? Is the language personal? 

For the Psychological Variables an inventory was 
constructed on the basis of Miller & Daly's Writing 
Apprehension Test and Bergen's Situation-Specific 
Apprehension Test. The instrumentation resulted in three 
scales: Fear-of-not-being-able-to-wri te, or Ease of Writing 
(EaW) (14 items, Cronbach * s alpha .90, characteristic item: 
Even before I begin my essay I think it will turn out badly), 
Attitude-towards-being-evaluated, or Rewards of Writing (ReW) 
(8 items, Cronbach's alpha .80, characteristic item: I would 
like my friends to read what I have written) and 
Attitude-towards-writing, or Enjoyment of Writing (EnW) (9 
items, Cronbach's alpha .91, characteristic item: Writing is 
fun). All items correlated with their scale .47. The 
correlations between the scales varied from .41 to .48. 

5.3. EXPERIMENTAL AND CONTROL TEACHING PROGRAM 

Development of the experimental teaching program. 
At a school one of the authors and two fellow teachers 
developed and tested a procedure for teaching the writing of 
goal-oriented and audience-oriented discursive texts in four 
third forms in two sorts of secondary sciools (comparable to 
ninth grade). By means of process research, questionnaires, 
analyses of the reliability of the feedback instruments, and 
learner reports, this procedure was evaluated and refined. 
The final program contained four block courses of 10-12 
lessons each. All block courses were constructed on the same 
pattern, as follows . 

A. Preparatory lessons (3-6 lessons). In these lessons the 
students studied instruction texts on aspects of writing and 
texts. Two aspects were introduced in each course: goal and 
audience orientation in the first course, organization and 
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thrustworthiness in the second, news value and usage in the 
third. In the fourth course the instruction text contained a 
synthesis of all the steps in the process that had been dealt 
with up to that point* More over, in the preparatory lessons 
some tasks were formulated to study writing tasks and 
information on the subject consisting of articles or cuttings 
from newspapers, magazines and brochures. The students then 
wrote, in class, a rough draft first and then a first 
version. As a reflection task they then described how their 
first version had been written ( 'How did the writing process 
go this time?') and what they themselves thought of their 
texts ('What do I think of my essay?'). 

B. Comaentary lessons. Each student was given a copy of 
the essays of three arbitrarily selected, anonymous peers. He 
read each essay and by means of a subjective reaction form 
with items like 'I am not convinced' indicated his impression 
of what he had read. After this first reading the student 
carried out his second reflection task: 'What have I learned 
from reading texts written by another?' Then he read the same 
essays more closely and gave more detailed feedback by 
answering questions on a comment form. The aspects on which 
comments were given corresponded to those in the instruction 
texts. Five questions of this type were asked on each aspect. 
Besides answering these comment questions the commentator 
also had to carry out a variety of tasks in the essays 
themselves, such as indicating audience-oriented sentences 
and phrases and the transitions between introduction, body 
and end. At the same time some of the comment questions 
obliged the commentator to indicate features to be judged 
positively or negatively in the essay itself. 

Commenting on an essay took 30-45 minutes. Students 
performed the task in two lessons; what they did not finish 
in class they did at home. For the conclusion of this stage 
students carried out a third reflection task: 'What have I 
learned from commenting on received texts written by 
another? ' 

C. Processing coBuaents. Students were given subjective 
reader-based and objective criterion-based feedback, all 
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notes on their essays, and any further written notes to these 
assessments, from three anonymous commentators. They ordered 
the comments, summarized them and wrote comments on them. It 
was stressed in their instructions that writers were 
responsible for their own choice from the feedback, but that 
they had to give reasons for their choice. Next, the students 
then drew up a rewriting plan in which they indicated what 
they planned to change and how they wanted to change it. 
Processing comments was done partly at school (2-3 lessons) 
and partly at home. 

D* Final version. On the basis of the rewriting plan a 
final version was written in the course of a single lesson. 

The student then gathered his work together in a ring 
binder. The teacher evaluated the binders for each block 
course for their completeness and neatness and for the 
quality of the work, but not the essays. Every quarter a 
student selected one of his essays for evaluation by the 
teacher for a mark counting towards his end-of-term report. 

The students in the control groups went through exactly 
the same program, except that at stage B it was the teacher 
who provided the feedback, using the same forms, criteria and 
tasks as in the experimental condition. The teacher spent 
approximately half an hour on each essay. 

5.4. DATA COLLECTION 

561 students participated in the main survey. We had test 
data on 76% of these students. For the analysis twelve 
students were selected at random from each class. This 
subgroup was found to be representative as far as the 
Psychological Variables, the only dependent variables for 
which we were able to make this comparison. Scores of these 
twelve students formed the basis for the calculation of the 
class scores. 

The 792 essays (22 classes x 12 students x 3 tests) were 
typed and evaluated by two trained raters using the essay 
scales for Goal orientation. Audience orientation, 
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Organization and Style. The inter-reliability of the essay 
rating (n=792) varied from .64 (Audience orientation) to .78 
(Organization, Goal orientation), with a mean of .74. Intra- 
reliability (n=30) varied from .69 (Organization) to .92 
(Style), with a mean of .85. 

At the three test moments an attitude inventory was taken of 
all students. This consisted of 31 items representing three 
variables. To determine how the scales behaved in the main 
survey they were analyzed again. The items were mirrored in 
such a way that a high score is positive: little apprehension 
(EaW) or positive attitudes (ReW and EnW) . The scores of 
subjects who failed to answer one or more items were not 
included in the calculations. 

For each scale the homogeneity (Cronbach's alpha), the 
item/rest correlation and the correlations between the sum 
scores of items from the three scales were calculated. All 
calculations were carried out separately for each 
administration of the tests. The correlations between the 
scales vary from .27 to .40, with a mean of .34. The scales 
proved to be homogeneous. Cronbach ' s alpha and item/rest 
correlations were calculated separately for the three 
moments. For the ' Fear-of-not-being-able-to-wri te ' scale the 
alphas were .90, .89 and again .89 and the item/rest 
correlations varied from .35 to .71. The alphas of the 
Attitude-towards-being-evaluated scale were .80, .81 and .82; 
the item/rest correlations varied between .29 and .66. For 
the 'Attitude-tcwards-writing ' scale we found alphas of .90, 
.93 and again .93 and item/rest correlations varying from .56 
to .85. 



5.5. MONITORING THE IMPLEMENTATION 

The implementation of the teaching programs was monitored by 
three methods: teacher and student logbooks, observations and 
questionnaires. Throughout the program, both teachers and 
students kept logbooks. Analysis of these showed that 
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teachers sometimes found it necessary to make small 
adjustments to the timetable in the teacher manual ♦ None of 
these changes materially affected the programs. 

Lessons were observed at two moments viz. during the 
second and fourth teaching block. From the time-on- task 
observations it became clear that the average student spent 
about thirty minutes, three-quarters of a lesson, on those 
things that the program demanded of him. The experimental 
group seemed to be more involved in the teaching program than 
the control group, though the differences in the fourth 
teaching block were no longer statistically significant. 

On the same two occasions questionnaires of between 61 and 
86 questions were completed. Using the results from these 
questionaires it was possible to determine whether according 
to the students, all planned learning activities had taken 
place. It was also possible to see which feedback messages 
the students had received, and v/e could establish any 
differences between the two conditions. Third, we were able 
to find out what students themselves thought of the quality 
of the feedback they had received and what influence they 
thought the comments had had on the rewritten versions of 
their essays. 

Table 4: Average time spent on lesson content in the two 
conditions in percentages of total time {n=36) 
(E=experimental group; C=control group) 



Groups 

Moment of E C 

observation 

mean sd mean sd t-value 



2nd block 83.80 17.24 72.14 17.27 2.87* 

4th block 78.72 16.27 73.56 15.73 1.36 



*significant p < .05 

In terms of circumstances (Independence, Teacher effort. 
Working atmosphere. Own effort) the two conditions proved to 
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activities. The core tasks of the commenting task emerged as 
having been strongly implemented in the Experimental 
condition. The students were found to have put a lot of time 
into the commenting task only about 10% said they had 
finished with an essay within a quarter of an hour, whereas 
8% (second block) and 24% (fourth block) spent 45 minutes on 
it. Prom the answers to questions on the comments received it 
was indeed apparent, that the core of the feedback task was 
strongly implemented. However, there were still some 
differences between the Conditions. In the Experimental 
Condition 53% of the students said that one or two of their 
classmates wrote no comments in the essay itself, whereas in 
the control Condition 26% of students made the same 
observation of their teacher. 

Table 5: Nature and quantity of feedback in both conditions 
at two moments. The percentage of students per 
condition who said they had received no comment from 
their teacher (control condition) or little or no 
comment from one classmate (experimental condition). 
(E=Experimental group, C=Control group) 



Block 2 Block 4 



Commenting tasks 


E 


C 


E 


C 


1. 


Pilling in 12 reaction statements 


5.7 


14.0 


n. a 


n.a 


2. 


Description of first impression 


12.3 


15.2 


15.9 


17.3 


3. 


Answering of comment questions 


10.3 


18.6 


9.1 


19.3 


4. 


Description of writer's intentions 


8.4 


9.5 


11.3 


13.9 


5. 


wavy lines under audience-oriented sentences 


54.3 


26.9 


36.4 


39.4 


6* 


Marking of parts (beginning, middle, end) 


48.8 


34.9 


n .a 


n.a 


7. 


Marking of principal sentences in paragraphs* s 


49.0 


26.9 


n.a 


n.a 


8« 


Marking of linking words and sentiences 


64.7 


30.6 


n.a 


n.a 


9, 


Marking of structure-indicating words and sentences 


77.7 


53.7 


n.a 


n.a 


10. 


Reaction signs in margin 


44.4 


3^.0 


n.a 


n.a 


11. 


Pl;»cing reference numbers for comment- questions in essay 


61.1 


23.3 


48.2 


23.3 


12. 


Writing amplification to answers 


28.1 


35.3 


18.9 


32.4 


13- 


Writing remarks in the essay 


53.1 


26.1 


52.3 


32.2 



By far the majority of students in both conditions 
appreciated of the feedback they had received. It was 
conspicuous that in the control condition in the fourth block 
very few students had a negative opinion of the feedback they 
had received from their teacher (table o). 
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Table 6: Dissatisfaction concerning the feedback received* 
Percentages of students who aegatively judged the 
comments of two or three peers (E) and the teacher 
(C)* (E=Experimental group, C-Control group) 



block 2 block 4 

E C E C 



Clarity 11.6 22.6 10.9 5.3 

Care taken 27.7 16.2 16.0 2*7 

Helpfulness 33.2 21.0 25.5 1.5 



Assimilation of comments also proved to be strongly 
implemented. There were no great differences between the 
conditions. 

6. RESULTS 

We have already seen that the most important condition for 
testing was met. In both the experimental and the control 
condition the program was well implemented. At the same time 
the dependent variables were measured as being reasonably 
reliable (interrater reliability for che performance 
variables and homgeneity for the Psychological Variables). 

Testing of effects of the program was separate for the 
performance variables (Goal Orientation, Audience 
Orientation, Organization and Style) and the Psychological 
Variables (Ease of Writing, Rewards of Writing ana Enjoyment 
of Writing). In the following we first report on the 
performance variables and next on the Psychological 
Variables. We will then explore the effects of variables such 
as sex, writing ability and writing apprehension. 
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6.1. WRITING PERFORMANCE 



Table 7 shows the correlations between the Writing 
Performance Variables at various times for the total sample 
and for the experimental and control groups. 

Table 7: Correlations between pretest, midtest and posttest 

scores for Writing Performance Variables for all 

?i^^?^f (^=22) and the classes in each condition 

I jcj/ C ) (n=ll} 



Pretest - midtest Pretest - posttest 



Dependent 
Variables 


T 


E 


C 


T 


E 


C 


Goal Orientation 


.38 


.41 


.41 


-.02 


.27 


-.35 


Audience 
Orientation 


-.02 


-.22 


.51 


.22 


.27 


.49 


Organization 


.36 


.19 


.70 


.52 


.62 


.37 


Style 


.52 


.47 


.60 


.66 


.63 


.71 



The values of the correlations between performance at various 
moments justify a covariate approach, at least in so far as 
there are differences in performance between the moments. 
Table 8 gives the mean and standard deviations at the three 
moments . 
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Table 8: Mean and standard (in parentheses) deviations of 

class scores in both conditions on the pretest (1), 
midtest (2) and posttest (3) for the dependent 
variables Goal Orientation, Audience Orientation, 
Organization and Style (all four by scale 
assessment) 



Dependent 
Variables 



Condition 

Experimental (n=ll) Control (n=ll ) 

Testing moment 



Goal 102.48 105.42 105.55 100.92 106.70 106.72 

Orientation (4.23) (5.01) (5.97) (4.73) (5.89) (4.11) 

Audience 103.37 106.60 100.30 100.98 108.52 101.92 

Orientation (2.96) (5.79) (4.20) (2.40) (4.56) (3.08) 

Organization 91.54 101.88 106.23 91.92 99.53 106.56 

(3.96) (4.50) (6.08) (2.47) (5.20) (5.84) 



Style 



101.34 104.85 111.68 101.87 105.93 112.17 
(3.98) (5.95) (3.69) (3.32) (4.49) (2.95) 



There are small differences between the mean class scores at 
three moments. 

By and large the assumptions for MANCOVA are met, so 
testing is justified. 

The differences between the conditions at the second 
moment, corrected for the scores on the pretest, are small 
(MANCOVA: F=1.97, df=4/l3, p=.16), at the third moment 
negligible (MANCOVA? F=.57, df=4/l3, p=.69). In other words 
we are unable to show any statistically significant 
difference between the conditions. 



6.2. PSYCHOLOGICAL VARIABLES 



Table 9 gives the correlations between the Psychological 
Variables at the three moments, for the total sample and for 
the experimental and control groups. 
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Table 9: Correlations between pretest, midtest and posttest 
scores for Psychological Variables for all classes 
(T) (n=22) and the classes in each condition (E/C) 
(n=ll) 



Pretest - midtest Pretest - posttest 



Dependent 

Variables TEC TEC 



Fear-of-not- 

being-able-to- .61 .61 .61 .81 .73 .89 

write 

Attitude- 

towards- .53 .33 .72 .46 .30 .72 

being-evaluated 

Attitude- 

towards- .58 .14 .72 .49 -.10 .74 

writing 



These correlations in table 9 also make a covariate approach 
useful. Testing is only justified if there are differences 
between means. Table 10 shows that these differences exist. 



The differences in table 10 were tested with MANCOVA. The 
assumptions for MANCOVA are fulfilled. The multivariate null 
hypothesis cannot be rejected for the Psychological 
Variables. (Moment 2; F=.12, df=3/l5, p=.95; moment 3; F=.30, 
df=3/15, p=.83). There are no statistically significant 
differences between the conditions. 
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Table 10: Means and standard deviations (in parentheses) of 
class scores on the Psychological Variables. 



Condition 

Experimental (n=ll) Control (n=ll) 

Testing moment 



Dependent 
Variables 



Fear-of-not- 47 . 20 47 .06 47 . 69 46 . 48 46 . 72 47 . 19 

being-able-to- (3.20) (4.12) (3.02) (3.03) (3.85) (3.54) 
write 

Attitude- 24.10 25.17 25.06 24.53 24.83 24.81 

towards- (2.54) (1.65) (3.45) (2.40) (2.42) (2.48) 
being-evaluated 

Attitude- 27.02 24*79 25.55 25.30 24.01 24.57 

towards- (1.52) (2.61) (4.01) (3.86) (3.47) (4.22) 
writing 



6.3. EXPLORATIONS 

xn our discussion of the literature we observed that peer 
evaluation may sometimes be suitable for particular groups of 
students. Three important intermediating variables emerged: 
seX/ writing ability and writing apprehension. 

By way of exploration we looked to see what effect these 
variables had on the performance variables. Here we left 
class level and carried out the analyses at student level. 
This also meant a considerable increase in statistical power, 
though the number of degrees of freedom for the test was 
overestimated due to intraclass correlation. Because of the 
exploratory nature of the analyses, no correction (2/3 * 
degrees of freedom) was carried out. 

The relation between program and sex was examined using 
multivariate covariance analysis with the independent 
variables program, sex and teacher, covariants the 
performances at moment 1, and dependent variables 
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performances at moment 2 and 3 respectively. A sex main 
effect could be demonstrated for performance at t2 (F=4/212, 
df=3.420, p=.01)* Prom univariate analysis it emerged that 
this was principally a matter of the build-up scores: boys 
score higher than girls. At tS there was no longer any 
question of a statistically significant sex effect. The 
interaction effect of program and sex was not significant 
(F=1.47, df=4/212, p=.21). 
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Figure 2: Progress by boys and girls in experimental and 
control condition, broken down by the four 
dependent performance variables 
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The same applies to the poor and good achievers. For each 
dependent variable (Goal Orientation, Audience Orientation, 
Organization and Style) a group of poor achievers and a group 
of good achievers was defined by selecting those students 
whose achievements were below or above the lowest and highest 
quar tiles respectively. (All our comparisons from nov; on are 
between moment 1 and moment 2, since that is where we 
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expected to find the greatest effects.) Table 11 gives the 
results of the tests for each dependent variable and each 
group. 

Table 11: Results of MANCOVA tests of treatment effects for 
selected groups: low initial level on Goal 
Orientation, Audience Orientation, Organization 
or Style, or high initial level on Goal 
Orientation, Audience Orientation, Organization 
or Style. 



Group Selection variable F-value degrees p-value number 

of free- cases 
dom 



Goal Orientation 


.62 


4/58 


.65 


67 


Audience Orientation 


1.76 


4/59 


.15 


68 


L0V7 Organization 


.72 


4/64 


.58 


73 


Style 


.36 


4/59 


.84 


68 


Goal Orientation 


1.97 


4/55 


.11 


64 


Audience Orientation 


2.67 


4/50 


.04 


59 


HIGH Organization 


1.83 


4/51 


.14 


60 


Style 


2.78 


4/50 


.04 


59 



For poor writers (first quartile) there is no difference 
between the programs. For good writers (fourth quartile) 
there appear to be some differences between the programs. 
Students who score high on Audience Orientation benefit more 
from the control program, those who score high on Style 
benefit more from the experimental program. However, the 
effects are small and not unambiguous. 

The results show no differential effect for peer 
evaluation for a particular category of writers. When we 
analyze the results of incorrect qualification of poor 
achievers in terms of the Psychological Variables the effects 
are almost totally absent. Here again we carried out analyses 
for the first and fourth quartiles. The results are given in 
table 12. 

In these groups too no unambiguous differential effects could 
be demonstrated. The marginal significance for the group 
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•very eased about writing' favored V , experimental program 
and was expressed mainly in Organization scores. 

Table 12: Results of MANCOVA tests of treatment effects for 
selected groups; low initial level for Ease of 
Writing, Rewards of Writing or Enjoyment, or high 
initial level for Ease of Writing, Rewards of 
Writing or Enjoyment of Writing. 



Group Selection variable F-value degrees p-value number 



of free- cases 
dom 



Ease of Writing 1.25 4/50 .30 59 

LOW Rewards of Writing 1.27 4/50 .29 59 

Enjoyment of Writing 1.15 4/49 .34 58 

Ease of Writing 1.95 4/71 .11 80 

HIGH Rewards of Writing .89 4/43 .48 52 

Enjoyment of Writing 1.44 4/59 .23 68 



Finally, the literature suggests that peer feedback is 
superior to teacher feedback principally for underachieving 
girls, and that for underachieving boys the reverse holds: 
teacher feedback is suggested to be superior to peer 
feedback. To explore this hypothesis we carried out the 
MANCOVAs for girls and boys with a low initial level (tl) on 
performance variables, lower than the median score for the 
whole sample. The results in table 13 show that once again 
there is virtually no differential effect. The effect for 
girls scoring low on Goal Orientation (tl) is in favor of the 
control group. The marginal effects for girls on Organization 
and Audience Orientation, by contrast, were in favor of peer 
feedback. For underachieving boys there are no clear effects 
at all. In short, the effects are either absent, or they are 
small and point in different directions. We also had to make 
allowances for chance capitalization. 

These explorations were unable to find any clear treatment 
effects for particular selected groups of students. Thus, 
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this survey provides no support for the theses (see xxx). In 
the next section these results are evaluated* 



Table 13: Results of MANCOVA tests of treatment effects for 
selected groups: girls with a low initial level 
on Goal Orientation, Audience Orientation, 
Organization or Style, and boys with a low 
initial level on Goal Orientation, Audience 
Orientation, Organization or Style. 



Group Selection variable F-value degrees p- value number 

of free- cases 
dom 





Goal Orientation 


2.89 


4/62 


.03 


71 


GIRLS 


Audience Orientation 


1.73 


4/56 


.16 


65 


LOW 


Organization 


1.82 


4/75 


.13 


84 




Style 


.74 


4/63 


.57 


72 




Goal Orientation 


.34 


4/52 


.85 


61 


BOYS 


Audience Orientation 


1.55 


4/56 


.20 


65 


LOW 


Organization 


.46 


4/41 


.77 


50 




Style 


.82 


4/45 


.52 


54 



1. DISCUSSION 

Why does peer evaluation have no effect on differences in 
scores on Psychological Variables and Writing Performance? In 
this chapter we will suggest and test some explanations. 
First we will look at the Psychological Variables, after 
which we will move on to W?;iting Performance Variables. 

A more positive Attitude-towards-being-evaluated was 
attributed by practical educationists to the safer task 
situation in peer evaluation, because the evaluating teacher 
is eliminated. However, with the addition of the rewriting 
phase to the instruction series in both conditions the task 
situation was less threaenting than it usually is in the 
teaching of written composition, where the writing of an 
essay is commonly associated with grades. Scores on 
Atti tude-towards-being-evaluated show a slight rise in both 
conditions, but not to such an extent that we can say that 
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perhaps the addition of a rewriting phase would lead to less 
writing apprehension, unless the fear of being evaluated 
commonly increases during the course of the year in which a 
student is in the third form of HAVO and VWO schools, and our 
teaching programs inhibited that increase* But we know 
nothing about the 'normal' level of Atti tude-towards-being- 
evaluated* In any event, peer evaluation does not lead to 
progress on this variable* By comparing their products with 
those of another it was thought that students would acquire 
more knowledge about their own skills, and that this would 
lead to increased confidence in their own ability and hence 
to lower scores on the Fear-of-not-being-able-to-wri te 
variable* The scores showed no development during the period 
studied and no differences between the two conditions* 
Evidently peer evaluation does not produce an increase in 
confidence in one's own writing ability* 

It was suggested that if students read good texts and 
started writing better texts they would gain more enjoyment 
from writing* From table 10 it is clear that in both 
conditions ::he enjoyment of writing declined and that there 
were scarcely any differences between the two conditions* 
However, the teachers involved in the survey told us while 
the survey was still in progress that students found the 
course very intensive. Perhaps the reduction in enjoyment can 
be attributed to this factor* 

That peer evaluation did not lead to differences between 
the conditions on the three Psychological Variables may be 
attributed to the differences between the teaching programs, 
which were perhaps not large enough* We will return to this 
argument when discussing the results on Writing Performance 
Variables* In the case of the Psychological Variables we 
would like to add two further possible explanationso First; 
it emerges from the correlations between student scores at 
the various testing moments that the three variables 
represent a fairly stable trait which evidently does not lend 
itself to being influenced by a writing skills course at this 
age* Second: the correlations between Psychological Variables 
and Writing Performance Variables are very low and en not 



even statistically significant. If Psychological Variables 
changed, as was assumed on Attitude-towards-being-evaluated, 
with the increasing quality of one's own essays, a causal 
connection of this kind would be impossible if only because 
of the absence of covariance. Here we would like to emphasize 
that there are signs that the Psychological Variables 
represent stable traits and that there is no connection 
between these variables and Writing Performance Variables. 
These two conclusions may separately or together explain why 
peer evaluation had no effect on the scores for the three 
Psychological Variables. 

How can the absence of differences on the Writing 
Performance Variables be explained intrinsically? The first 
explanation might be that there was very little difference 
between the teaching programs. We did not make things easy 
for ourselves by comparing peer evaluation with an 'average* 
way of teaching written composition, but instead constructed 
a competitor that comprised much more than what is common in 
teaching written composition. Although the two programs 
looked as much like each other as possible, there were 
nevertheless considerable differences between them. In the 
experimental condition students read one another's essays, 
commented on them, and received feedback from three other 
students? in the control condition the students received 
comments from the teacher only. The students in the 
experimental condition spent more time (two lessons per block 
and three-quarters of an hour extra homework) on the teaching 
program. Moreover there are signs in the logbook analysis and 
questionnaire survey that there were yet other differences 
between the two conditions. In the first place, in the 
control condition the lessons in which the teacher returned 
students' essays with feedback turned out to be less than 
quiet: students kept asking for things to be explained and 
the teacher had to keep going round the class giving 
additional information. Probably, during these lessons 
students received a form of extra teaching that was denied 
the students in the experimental condition. Second, it 
emerged that teachers more often than students wrote 
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additional comments in the margin and the first were more 
likely to put numbers of the questions from the feedback 
instrument in the margin of the essay. Thus the nature of the 
feedback may have been different for the two conditions. What 
value we should place on these two differences remains 
uncertain. The extra teaching that some students received in 
the control condition did not apply to the whole class. Those 
students who asked for and received further information may 
have been privileged in some way, but the difference can 
never be raised to the status of a systematic difference. The 
effect of the additional written comments by teachers may 
have helped students understand the nuances in the feedback, 
whereas by contrast students in the experimental condition 
had to find their own way in the large amount of feedback 
from three commentators. We believe the differences to be 
inherent to the two conditions. They are not regarded as 
blurring the edges of the differences that we were looking 
for. The differences between the conditions are in our view 
large enough to justify the expectation of differences in 
writing performance. 

A second explanation for the absence of differences may 
be that in the experimental condition students learnt from 
giving criticism, but that the additional gain compared with 
students in the control condition was nniii*i^»/q 4.-u« 
difference in the quality of the feedback in the two 
conditions. It seems reasonable to assume that students in 
the control condition received qualitatively better comments, 
since these comments were given by an experienced commentator 
who spent a lot of time on each essay, than those in the 
experimental condition. Despite this, it emerged from the 
questionnaire that students in both conditions were satisfied 
with the clarity, care taken and helpfulness of the feedback 
they received. Now satisfaction can, of course, be considered 
an operationalization of the perceived quality of the 
feedback: we do not know what relation there is between the 
perceived quality and the objective qualities such as 
usability and accuracy of the comments. Even so, this does 
give us an indication that in the judgement of the students 
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there is not much difference between the two types of 
feedback at the center of our investigation. On other 
questions relating to how the feedback was used and its 
effect on the rewritten versions there was again little 
difference between the answers in both conditions. All in 
all, then, our conclusion here is that there was no 
difference in the quality of the feedback given in the two 
conditions. This also means that there is some doubt about 
the added value of peer feedback, a point stressed in the 
theory. A further investigation of the quality and 
effectiveness of the feedback given by students and teachers 
would have to be included in the comparison of peer 
evaluation and teacher feedback; our survey, because of 
practical circumstances, gave us no opportunity to carry out 
any such further investigation. 

It is even debatable whether feedback is relevant at all. 
Certainly it is useful for revising a text that has already 
been written, but when one produces a new text perhaps the 
previous feedback fails to provide adequate support. Writing 
a new text is a new problem-solving process in which specific 
feedback given during a previous problem-solving process will 
naturally be of somewhat limited application. We have one 
piece of information that supports our supposition that 
feedback is of little value even when it comes from a 
student's peers. We ranked the classes within a condition 
according to the proportion of students who expressed 
dissatisfaction about the usefulness of the received 
feedback. We did this for both the second and the fourth 
block. At the same time we ranked the classes on writing 
performance; both for the midtest and for the posttest, for 
all Writing Performance Variables individually (Goal 
Orientation, Audience Orientation, Organization and Style) 
and for their sum. We then calculated the ranking 
correlations between the degree to which students complained 
about the feedback and the various writing performance 
rankings. Of the ten ranking correlations between perceived 
quality of feedback and writing performance in the control 
condition only one was significant. Evidently differences 
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between classes relating to the evaluation of the feedback 
are unconnected with differences in writing performance. In 
the experimental condition we found three significant rank 
correlations between the perceived usefulness and writing 
performance, viz- Style (2x) and overall writing performance. 
These data, together with those from other surveys from which 
it is apparent that the intensity, the tone, the manner of 
presentation and even the presence of feedback are all 
irrelevant (Wesdorp, 1983) lead us to suppose that feedback 
might be a much less important element of instruction in the 
teaching of written composition than we used to think. 
Accordingly it is also quite possible that peer feedback is 
less instructive than claimed by those who practise it and 
whose ideas we used for our theory. 

If it is true that feedback differences between 
conditions do not cause differences in performance, and if it 
is rrue that feedback contributes little, or even nothing at 
all, to the improvement of writing performance, then there 
still is a difference between the conditions as a result of 
which performance differences might be expected. In the 
experimental condition students did, and in the control 
condition they did not, comment on the essays of one another. 
Thus, in the experimental condition students had more 
opportunity of internalizing the criteria for a good text, 
which was also the purpose of the course. Much was expected 
of this learning activity, and from secondary data it did 
actually emerge that students in the experimental condition 
had a better grasp of the criteria than their colleagues in 
the control condition. Particularly in the lessons referred 
to earlier in which in the control condition teachers 
returned essays to students with comments, it was clear that 
students felt the need for a lot more information. This might 
be a sign that in the experimental condition there was a 
greater, and in the control condition a lesser, insight into 
the content of the criteria. However, in the experimental 
condition it was one of the rules of the game that students 
were not allowed to consult one another about the feedback, 
even though didactically speaking there would be much to be 
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said for such consultation, since much can be clarified in a 
one-to-one interaction. It is therefore quite possible .hat 
even they did not have as much insight as was previously 
thought. On the other hand students in both conditions found 
the feedback clear, helpful and careful, and from this we 
deduce that students in both conditions still acquired 
criteria for good texts. It is conceivable that commenting on 
one anothers' essays gave students in the experimental 
condition an advantage over those in the control condition, 
but that this advantage was subsequently nullified by some 
other factor. It is possible, for example, that students in 
the control condition reached the knowledge level of those in 
the experimental condition by receiving feedback based on the 
criteria, and because the instruction texts in which the 
criteria were presented in context assumed the function of 
background knowledge* On the other hand, just as one may 
question the value of feedback, so one can also question how 
helpful it is to a writer to know by what criteria a text is 
judged to be good. Knowing what makes a good text does not 
make you a good writer. Our expectation that more criteria 
would be generated in the writing processes of the 
experimental group was not fulfilled (Ri jlaarsdam, 1986). 
Students generated so few new criteria that the relevant 
cells remained virtiially empty = Perhaps the supposed relation 
between ':nowledge of criteria and writing performance is not 
at all as strong as the advocates of peer evaluation painted 
it. This finding agrees with Rubin's (1983) findings: 
students know what is wrong with a text, but they are unable 
to put it right. Further research is needed to investigate 
the relation between knowl^^dge of criteria and writing 
performance. 

To summarize, despite the differences in the two teaching 
programs we believe both led to the students learning more 
about criteria, both through the feedback on their essays and 
through the process of giving feedback themselves. This might 
explain the similarity between the writing performance in the 
two conditions. We were doubtful about the uso of feedback 
for the new writing tasks and, by extension, about the 
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usefulness of a knowledge of criteria for the writing of 
texts. Assuming for the moment that we are on the right 
track, how might we be able to explain the fact that, by and 
large, writing performance did improve to such an extent, 
indeed, that it is difficult to claim that the progress made 
was due solely to the students' ne^tural maturation? We 
suspect that the instruction texts about the six aspects of 
writing (Goal Orientation, Audience Orientation, 
Organization, Accuracy, News Value and Style) texts which 
were unusually detailed and comprehensive by the ordinary 
standards of teaching in Dutch schools contributed to 
this, as did, we think, the innovation of a revision plan. We 
have found some evidence to support these conjectures. As 
already reported, during the second and fourth blocks we 
asked the students to fill in questionnaires in which they 
were asked about their participation in specific 
teaching/learning activities. We distinguished clusters of 
activities: Preparatory activities (all activities preceding 
the writing of the first version to be submitted). Commenting 
Activities, Quantity of Comments, and Comments Processing. 
Model fitting (LISREL IV) showed that none of these 
activities was related to the improvement in written 
composition skill at both testing moments. In the posttest 
preparatory activities proved to account for over 5% of the 
improvement in written composition skill, while Comments 
processing accounted for over 8%. The most interesting point 
is that the items that carried these scales were related as 
to content. One of the Preparatory Activities consisted of 
the students evaluating their own first version using the 
revision criteria defined in the instruction texts, after 
which they had to rewrite their essays before submitting 
them. The item that carried the Comment Processing scale in 
the posttest comprised drawing up a rewriting plan. Both 
activities call on students to reflect on their own text and 
to apply their knowledge of good texts to their own. This 
'reprocessing' appears to produce results regardless of the 
feedback situation. That is, feedback leads inter alia to 
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reflection by the writer on his own text, provided that it is 
followed by a rewriting phase. 

Even if this study holds little encouragement for the 
advocates of peer evaluation in the sense that peer 
evaluation does not lead to better results than intensive 
teacher feedback, it does offer some help to curriculum 
designers because it shows that written composition skill can 
increase appreciably in quite a short time five months. This 
goes against what many teachers believe, viz. that written 
composition skill is an objective thfxt teaching can do very 
little to influence, and that the chief ingredient in the 
improvement that does occur is maturation in the writer. On 
the other hand, we were only able to account for 35% of the 
improvement: 25% was explained by the initial measurement and 
10% by the degree of participation in the curriculum. The 
other 65% remains an intriguing statistic for curriculum 
designers. 
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Appendix 1; Sources consulted in the design of a theory of 
peer evaluation 



Barry, 1980 
Bean, 1979 
Beaven, 1977 
Beck, 1978 
Bell, 1983 
Benesch, 1984 
Berkenkotter , 1983 
Beyer Sc Brostoff, 1979 
Bissland, 1980 
Blake & Tuttle, 1977 
Booher, 1982 
Bruf fee, 1980 
Buys, 1984 
Calkins, 1978 
Calzonetti, 1981 
Camplese & Mayo, 1982 
Christensen, 1977 
Clifton, 1980 
Coleman, 1978 
Collins, 1983 
Covington, 1979 
Crsws, 1983 
Crowhurst, 1979 
Damsma, 1985 
Danis, 1982 
Elbow, 1973 
Elias, 1982 
Ellman, 1975 
Engel, 1983 

Flanigan & Menendez, 1980 

Flynn, 1982 

Forman, 1980 

Freed, 1981 

Gebhardt, 1980 

Goldsmith, 1982 

Golsby, 1981 

Griffioen & Damsma, 1978 
Griffioen et al, 1982 
Gross, 1977 

Gwyn Sc Swanson-Owens, 1980 

Hafernik, 1984 

Hansen Sc Vogt, 1982 

Hawkins, 1978 

Healy, 1980 

Hoover, 1972 

Howgate, 1982 

Hurlow, 1983 

Irwin, 1980 

James, 1981 

Jones, 1981 
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Lamberg, 1980 
Laney, 19B3 

Langer 6 Applebee, 1983 
Laque, 1977 

Leidse Werkgroep Moeder- 

moedertaaldidactiek, 1980 

Lewis, 1981 

Lutkus, 1978 

Maimon, 1979 

Manzo & Sherk, 1977 

Martin, 1983 

Mazurek, 1979 

Megna, 1976 

Horfett, 1968 

Nijmeegse Werkgroep Taal- 

didactiek, 1978 

O'Donnell, 1980 

Osborn, 1980 

Parks, 1977 

Pasternack, 1981 

Peckham, 1978 

Pianko & Radzik, 1980 

Plevin, 1982 

Popham Sc Zarem, 1978 

Reid, 1983 

Reigstad & McAndrew, 1984 
Rivera-Hernandez, 1982 
Roundy, 1984 

Rijiaarsdam & Blok, 1981 
Sager, 1973 
Schuster, 1983 
Sears, 1979 
Selfe, 1981 
Silver 1978 
Smelstor, 1978 
Smith, 1975 
Smith, 1982 
Smith, 1983 
Sn:^pes. 1971 
Soven, 1980 
Spigelmire, 1981 
Spina Sc Welhoelterr 1981 
Steinacher, 1976 
Straver, 198jl 
Tremmel, 1983 
Turbilx, 1983 
Wagner, 1975 
Warner, 1979 
^eeks & White, 1982 
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Appendix 2: Effect studies consulted 



1. Benson (1979) 

2. Bouton & Tutty (1975) 

3. Burt (1980) 

4. Carter (1982) 

5. Clifford (1981) 

6. Copland (1980) 

7. Delaney (1980) 

8. Earls (1983) 

9. Farrell (1977) 

10. Ford (1973) 

11. Fox (1978) 

12. Karegianes et al. (1980) 

13. Lagana (1972) 

14. Lyons (1976) 

15. Maize (1954) 

16. Myers (1979) 

17. Pfeifer (1981) 

18. Pier son (1966) 

19. Sager (1973) 

20. Sears (1970) 

21. Ward (1959) 
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